
Correcting 
Correlations 
When Predicting 
Success In 
College 



Joe L. Saupe 
Professor Emeritus 
University of Missouri 

MardyT. Eimers 

Director, Institutional Research 

University of Missouri 



^F! 

Association for 
Institutional Research 

Supporting quality data and 
decisions for higher education. 



IR Applications 

Volume 31 , June 1 0, 201 1 

Using Advanced Tools, Techniques, and Methodologies 



Abstract 

Critics of testing for admission purposes cite the moderate 
correlations of admissions test scores with success in college. In 
response, this study applies formulas from classical measurement 
theory to observed correlations to correct for restricted variances in 
predictor and success variables. Estimates of the correlations in the 
population of high school graduates are derived from two of the 
several formulas in the literature. This article describes limitations and 
encourages additional investigation into the use of the formulas for 
estimating correlations in unrestricted populations. 



Critics of the use of test scores in college admissions cite the 
relatively low correlations between admission test scores and success 
in college, for example, first-year grade point average (GPA).They 
point out that a typical correlation of .40 between the admission test 
score and end-of-first-year GPA indicates that the test score predicts 
only 1 6% of the variance in the first-year grades. Critics assert that the 
percentage of variance is so low that colleges and universities place 
too much emphasis on test scores as predictors of college success 
(Kohn, 2001 ; Sternberg, Wagner, Williams, & Horvath, 1 995; Vasquez & 
Jones, 2006; among others). 

On the other hand, defenders of using test scores in admission 
decisions note that some of the criticism may be unwarranted. For 
example, Sackett, Borneman, and Connelly (2008) outlined several 
reasons why test scores might be more effective predictors of college 
success than what is typically reported. First, "variance explained" may 
not be the best metric for interpreting the predictive ability of test 
scores. That is, if you convert "variance explained" to "differences in 
odds of success," test scores predict the likelihood of a subject being 
successful quite well. 1 Second, first-year GPA is not perfectly reliable. 



1 "Differences in odds of success" refer to differences in the proportions of subjects whose 
predicted values from a regression model exceed some specified value (e.g., whose predicted 
first-year GPA exceeds 2.0). Sackett et al. (2008) also cite others who present alternative metrics 
for expressing the usefulness of tests as predictors that may be more meaningful than that of 
variance explained. 
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and unreliability in this GPA depresses its correlation 
with the test score. Consequently, the observed 
correlation between test score and GPA understates 
the true relationship between the two variables. 
Third, admission tests do a reasonably good job 
predicting grades beyond the first year of college 
in addition to predicting the performance on other 
dependent variables such as GRE scores, LSAT 
scores, doctoral degree achievement, and getting 
tenure (Kuncel & Hezlett, 2007; Lubinski, Benbow, 
Webb, & Bleske-Rechek, 2006; Vey et al., 2003). 
Fourth, concerns over using a restricted range or 
only examining the relationship between those 
students who take admission tests and students 
who actually go to college was cited as a major 
factor in limiting the predictability of test scores 
(Sackett et al., 2008). This final reason — restricted 
range — is the focus of the current study. 

What may be overlooked by the critics is that the 
correlations in question are frequently calculated 
using a population of entering freshmen who 
completed a full year of college (Linn, 1 990; Young, 
2004; Zwick & Sklar, 2005; among others). In judging 
the value of the test in predicting college success, 
a more relevant population is that of all potential 
college students, specifically, that of all high school 
graduates. Clearly, this is a more heterogeneous 
population than the one used in calculating the 
correlations. The implication of this situation is 
that the calculated correlations understate the 
more meaningful correlations for the unrestricted 
population. 

The population of high school graduates is 
initially restricted by the elimination of those who 
do not apply to college and next by those who apply 
but are not accepted and then by those who are 
accepted but do not enroll. Then, there is the group 
that enrolls on a part-time basis and may not be used 
in calculating the correlations of interest. Similarly, 
there is the further restriction from those who enroll, 
but do not complete the first year with a GPA. 

Indicators of success in high school are also used 
as predictors of success in college. As a matter of 
fact, the usual finding is that success in high school 
is a better predictor of success in college than is an 
admission test score (Hoffman, 2002; Munro, 1981; 
Zheng, Saunders, Shelley, & Whalen, 2002; among 



others). Further, the combination of test scores 
with measures of success in high school provides a 
better prediction of success in college than either 
predictor used individually (Eimers & Pike, 1997; 
Fleming, 2002; Geiser & Santelices, 2007; Kim, 2002; 
Linn, 1 990; Mathiasen, 1 984; Noble & Sawyer, 1 997; 
Wolfe & Johnson, 1 995; Zwick & Sklar, 2005). 

Purpose 

The purpose of this study is to illustrate 
techniques for correcting a correlation between a 
predictor of success in college (admission test score 
or indicator of high school performance) with a 
measure of success in college (one-year retention 
or first-year GPA), given the restricted variances in 
the population used to calculate the correlations. In 
other words, this study demonstrates procedures 
for estimating correlations in the unrestricted 
population (students who attend college and 
students who do not attend college) based upon 
correlations calculated for the restricted population 
(students who attend college). A secondary 
purpose is to set the foundation for and stimulate 
additional studies designed to estimate these 
correlations in other unrestricted higher education 
and college student populations. This study focuses 
on correlations involving admission test scores, 
indicators of success in high school, and first-year 
college GPA. 

Formulas for Correcting Correlations 

Classical measurement theory provides the 
means to "correct" correlations for restrictions in the 
variances of the correlated variables. The restriction 
in the variance of a predictor variable is due to 
selection on that variable or on a related variable or 
to selection on more than one variable. In the case 
of college admissions, students may be selected on 
the basis of an admission test, an indicator of high 
school success, or some combination of the two. In 
correcting correlations for restriction of variance, a 
distinction between explicit selection and implicit 
or incidental election is made. In order to correct a 
correlation, the variance of a relevant variable in the 
unrestricted population must be known. Selection 
on the basis of the predictor variable of interest for 
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which the variance in the unrestricted population 
is known as explicit selection, and that variable 
is referred to as the explicitly selected variable. 
Selection on the basis of some other variable that 
is related to the predictor variable of interest and 
for which the unrestricted population variance is 
known is considered to be implicit or incidental 
selection, and that predictor variable is the implicitly 
selected one. 

This study describes three cases and associated 
formulas for estimating correlations in unrestricted 
populations. These three formulas are the ones 
derived or discussed in the seminal literature and 
are the formulas that have the most application in 
the current research. In the end, two of the formulas 
are actually employed in correcting correlations 
in this study. Assumptions underlying the three 
formulas include the conventional ones of linearity 
and homoscedasticity. Also assumed is that the 
selection of the restricted population from the 
unrestricted population is as described for the 
formulas being used. 

The three situations and formulas for each 
are noted below. In these formulas, s (standard 
deviation), s 2 (variance), and r (correlation) are 
statistics from the restricted population, and S, S 2 , 
and R are the corresponding statistics from the 
unrestricted population; is the predictor variable, 
X 2 is the variable being predicted, and X 3 is a third 
variable related to X,. 

Case 1. In this case, subjects are selected on the 
basis of X, and the values of s, 2 , s 2 2 , S 2 2 , and r 12 are 
known. Gulliksen (1 950) describes this as the case 
of incidental selection, because it is the variance of 
the criterion variable that is known, and derives the 
following formula: 

R « = V 1 - 0 - r !2 « *l/sl ) 

Gulliksen cites Kelley (1 923), Garrett (1 947), 

Guilford (1942), Crawford and Burnham (1946), 
and Thorndike (1 947) as including this formula or 
equivalent versions of it. Also see Cureton (1 951 ) 
and Guilford (1965). 

In predicting college success, the variance 
of X 2 , the success variable (e.g., first-year GPA), is 
not known in the unrestricted population of all 



high school graduates because all high school 
graduates did not complete the first year of college. 
Consequently, the Case 1 formula cannot be applied 
in this study. 

Case 2. In this case, subjects are selected on the 
basis of X, and the values of s/, s 2 2 , S, 2 , and r 12 are 
known. Gulliksen (1 950) refers to this as the case of 
explicit selection on Xt and derives the following 
formula: 



12 /s 2 r 2 + s 2 - s 2 r 2 

V 12 ^ 1 2 

Slightly different but equivalent formulas are given 
by Guilford (1965) and Cureton (1951). Gulliksen 
(1 950) indicates that this formula was first derived 
by Pearson (1 903) and that slightly different 
versions of it have been given by Kelley (1923), 
Holzinger (1928),Thurstone (1931), Thorndike 
(1 947), Crawford and Burnham (1 946), and others. 

For example, if it is assumed that students are 
selected for admission on the basis of a predictor 
of college success, X u that the variance of the 
predictor variable in the unrestricted population of 
high school graduates, S, 2 , and the variances of the 
predictor, s/, and of the college success variable, 
s 2 2 , as well as the correlation between the two 
variables, r 12 , in the restricted range population are 
known, then the correlation between the predictor 
variable and the college success variable, R 12 , for 
the population of high school graduates can be 
estimated using this formula. 

A situation in which the Case 2 formula might 
be used is the one in which a test is given to a 
population of subjects and only those who score 
in the top, say, 40% on the test are selected for a 
program of instruction. At the end of the program, 
the selected subjects are given an achievement 
test on the subject matter of the instruction. The 
correlation, r 12 , of the selection test, X u and the 
achievement test, X 2 , and the variances, s, 2 and s 2 2 , 
of the two variables are known for the selected 
subjects, those in the restricted population. Also 
known is the variance, S, 2 , of all subjects who 
took the selection test, those in the unrestricted 
population. The Case 2 formula then would be used 
to estimate the correlation, R 12 , between the two 
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tests for the population of subjects who took the 
selection test. This situation is illustrated by Berry 
and Sackett (2008) in a study involving students at 
41 colleges. They found a correlation of .35 between 
SAT (Verbal + Math) score and first-year GPA.The 
corrected correlation for the population of students 
who took the SAT was .47. These situations are 
similar to the one for which the Case 2 formula is 
used in this study. 

Case 3. In this case, subjects are explicitly 
selected on the basis of X 3 , and Xi is a third variable, 
related to X 3 , for which values are available for 
subjects in the restricted population. It is desired to 
estimate the correlation between X u the incidental 
selection variable and the success variable, X 2 , 
for the unrestricted population. In this case, the 
values of s/, s 2 2 , s 3 2 , S 3 2 , r 12 , r 13 , and r 23 are known. 

The formula for this case, given by Gulliksen (1 950), 
follows: 

R _ [n ~ r i3 r 23 *"13*23 (^3 ) 

7s|7][i^7^T7^sy7s|)] 

Gulliksen (1950) indicates that variants of this 
formula appear in Pearson (1903) and Thorndike 
(1 947). The formula or its equivalent also appears in 
Cureton (1951) and Guilford (1965). 

For example, if it is assumed students are 
selected on the basis of one predictor of success in 
college, X 3 , that the variance of that variable in the 
unrestricted population is known, and that values 
of a second predictor variable, X,, are available for 
the selected students, then the correlation of the 
second predictor variable and the success variable, 
X 2 , in the unrestricted population can be calculated 
from this formula. 

Gulliksen (1950) provides two additional three- 
variable formulas for estimating correlations in 
unrestricted populations if the variance of the 
incidental selection in the unrestricted populations 
is known. These formulas are not given or discussed 
here. Sackett and Yang (2000) provide descriptions 
of 1 1 cases in which estimates of correlations in 
unrestricted populations might be estimated from 
data for restricted populations. 



The three cases discussed above are included 
among the 1 1 . This paper (Sackett & Yang, 2000) 
includes a comprehensive review of the literature 
on contributions to the matter of estimating 
population correlations from samples that have 
been restricted due to any of several types of 
selection. It is recommended to anyone seeking to 
investigate further the topic of the present study. 
The presentation by Thorndike (1949) is frequently 
cited in the literature on the topic and also is 
recommended to those interested in pursuing it 
further. 

The Data and Their Application 

The data for this study come from a population 
of first-time freshmen who entered a major 
research university with moderately selective 
admission standards in the fall 2008 semester, 
whose high school class percentile rank was 
50 or greater, who entered the fall semester as 
full-time, degree-seeking students, and who 
completed both semesters with complete data for 
the study variables. There are 3,668 students in 
this population. The variables collected for these 
students are: 

ACT-C - ACT Composite Score 

HSCPR - High School Class Percentile Rank 

NHSCPR - Normalized High School Class 

Percentile Rank 

CCGPA - High School Core Course Grade 

Point Average 

FYGPA - Freshman Year Grade Point Average 

The HSCPRs are transformed into normalized 
values (NHSCPRs) by converting cumulative 
percentile values from a normal curve tables into 
z-values, transformed to values with a mean of 
50 and a standard deviation of 10. The NHSCPR 
variable is used in the data analysis because it has 
more desirable statistical properties than the HSCPR 
one. The CCGPA is an average on a 4-point scale 
calculated from the high school academic core 
courses in English, mathematics, science, social 
science, and fine arts. The other variables are as 
traditionally defined. 
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Some statistics for this population are shown in 
Table 1. 

Table 1 

Means, Standard Deviations, and Variances for Study 
Variables for Students in the Study Population, N = 
3,668 





Variable 


Statistic 


ACT-C 


NHSCPR 


CCGPA 


FYGPA 


Mean 


25.65 


60.13 


3.50 


3.05 


s 


3.71 


6.32 


0.38 


0.67 


s 2 


13.76 


39.94 


0.14 


0.45 



For the study, the correlations between each 
of the three predictor variables, ACT-C, NHSCPR, 
and CCGPA and the college success variable FYGPA 
are calculated. These statistics are descriptive of 
the restricted population for which the data are 
collected. The applicable formulas described in 
the preceding section are then used to estimate 
the values of the correlations for all high school 
graduates, those in the unrestricted population. 

The admission requirements for the subject 
university include minimum numbers of completed 
units of core subject areas of high school courses 
and standards based upon ACT-C and HSCPR. 
Specifically, students with an ACT-C of 24 or higher 
are admissible, and those with scores between 1 8 
and 23 are deemed admissible on the basis of a 
sliding-scale combination of their ACT-C and HSCPR. 

In calculating the desired estimates of 
correlations for all high school graduates, it is 
assumed that enrolled students are selected on the 
basis of HSCPR or NHSCPR. In order that the data 
to be analyzed conform as much as possible to this 
assumption, the study population is restricted to 
students whose HSCPR or NHSCPR is 50 or greater. 

In other words, this restricted population is treated 
as having been selected on the basis of HSCPR or 
NHSCPR. Even with this restriction, the assumption 
is not completely correct. First, high school 
graduates "self-select" in deciding whether or not 
to apply to the subject university. Second, students 
are also selected on the basis of core course 



requirements and ACT-C. Further, the population 
of admitted students is also restricted to those 
who enroll full-time and complete the freshman 
year with complete data on the study variables. 
Consequently, the estimated correlations will be in 
error to some undetermined degree. However, the 
derived values are certainly more accurate estimates 
of the correlations for all high school graduates than 
are the values calculated for the enrolled student 
population. Gulliksen (1950) suggests that"ln many 
cases, however, it is clear that a given selection 
test was one of the major items in the selection 
procedure so that the results found by assuming 
that selection was solely on the basis of the test will 
not be far from the correct estimate" (p. 1 29). 

The formulas for correcting correlations 
for restricted variances require that a standard 
deviation or variance of a relevant variable in the 
unrestricted population be known. For this study, 
the unrestricted population is that of all high school 
graduates. The variance of NHSCPR, the normalized 
values of HSCPR, for this population is 1 0 2 or 1 00. 
This is the value used to correct the prediction-of- 
success correlations. 

If the population of "college bound" high school 
graduates were of interest, the correlations for this 
population could be estimated using ACT-C scores 
and the variance of these scores in the population 
of ACT test-takers. For this study, however, the 
population of all high school graduates is of 
more interest than the population of high school 
graduates who have taken the ACT. Thus, this 
application of the formula is not explored. 

The degree of restriction for the study 
population is evident in the following: In recent 
years, the rate of college-going for the state in 
which the subject university is located and for 
the nation as a whole is around 60%. Thus, many 
students from the unrestricted population are 
clearly excluded from the study population. Next, 
approximately 14,500 prospective first-time 
freshmen applied to the subject university for the 
fall 2008, around 1 2,300 were admitted, and 5,800 
enrolled. The study population of 3,668 includes 
those who enrolled full-time, completed both 
semesters, and had complete data on the study 
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variables. This is a noteworthy reduction in size of 
the population and clearly suggests a reduction in 
the variances of study variables. 

Results 

Correlations among the several study variables 
for the study population 

are contained in Table 2. These are the restricted 
population correlations. 



Table 2 

Correlations Among Study Variables 



Variable 


NHSCPR 


CCGPA 


FYGPA 


ACT-C 


0.43 


0.36 


0.43 


NHSCPR 




0.78 


0.49 


CCGPA 






0.56 



Correlation ofACT-C with FYGPA 

To correct the restricted population correlation, 
.43, the Case 3 formula is used. It is assumed that 
students are selected on the basis of NHSCPR, X 3 . 
The predictor variable is ACT-C, X v and the variable 
being predicted is FYGPA, X 2 . The following values, 
from Table 1 and Table 2 (with four significant digits) 
are used in the formula: 

r 12 = .4257, r 13 = .4281, r 23 = .4891, 

s, 2 = 13.77, s 2 2 = .4484, s 3 2 = 39.94, andS 3 2 = 100. 
The result is: R 12 = .5630. 

Correlation of NHSCPR with FYPGA 

To correct the restricted population correlation, 
.49, the Case 2 formula is used. For this correlation, 
it is assumed that students are selected based on 
the predictor variable, NHSCR, X^The variable being 
predicted is FYGPA, X 2 . The following values, from 
Table 1 and Table 2 are used: 

r 12 = .5582, s, 2 = 39.94, s 2 2 = .4484, and S/= 100. 
The result is: R 12 = .7575. 

Correlation of CCGPA with FYGPA 

To correct this restricted population correlation, 
.56, the Case 3 formula is used. It is assumed 
students are selected on the basis of NHSCPR. The 



predictor variable is CCGPA, X,, and the variable 
being predicted is FYGPA, X 2 . NHSCPR, X 3 , is a third 
variable related to CCGPA. The following values, 
from Table 1 and Table 2 are used: 

r 12 = .5582, r 13 = .7829, r 23 = .4891, 

Si 2 =.1445, s 2 2 = .4484, s 3 2 =39.94, andS 3 2 = 100. 
The result is: R, 2 = .8025. 

Discussion 

Table 3 contains the correlation of each 
predictor variable with FYGPA for the study 
population, the selected population of enrolled 
students, and the corresponding estimate of the 
correlation for the unrestricted population of 
high school graduates. The table also includes 
percentages of the variance of FYGPA that is 
estimated by the predictor variable for each 
correlation. It is important to emphasize that the 
unrestricted population correlations are only 
estimates of the values that would be found were it 
possible to calculate them directly. To an unknown 
degree, the accuracy of the estimates is affected by 
the fact that the students in the study population 
were not selected purely on the basis of their HSCPR 
as assumed for the formulas used. 

Table 3 

Summary of Restricted and Unrestricted Population 
Correlations for Predicted FYGPA 





Restricted 

Population 


Unrestricted 

Population 


Predictor 


r 


% of Var 


R 


% of Var 


ACT-C 


0.43 


18 


0.56 


32 


NHSCPR 


0.49 


24 


0.76 


57 


CCGPA 


0.56 


31 


0.80 


64 



There are four major findings that can be 
gleaned from this study. First, by applying the 
formulas and correcting the correlations, the 
relationship between the predictor variables 
and the measure of success in college increased 
significantly. From a formulaic perspective, if 
the variance for NHSCPR is 39.94 for the study 
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population (see Table 1 ) and the variance for 
NHSCPR for all high school graduates is 1 00.0, 
similar restrictions should be realized in the 
variances of other study variables that are related to 
NHSCPR. These restrictions in variances indicate that 
the correlations in the population of high school 
graduates should be higher than those for the 
population of enrolled students. Accordingly then, 
the results are that the correlations increased from 
.43 to .56 for the ACT-C, from .49 to .76 for NHSCPR, 
and from .56 to .80 for CCGPA. By and large, these 
increases in correlations are quite significant, 
especially for NHSCPR and CCGPA. This finding also 
reinforces the argument that the predictive ability 
of these college admission indicators is relatively 
robust particularly when the broader population of 
all high school graduates is considered. 

Second, the correlation of ACT-C with FYGPA 
in the study population is relatively modest at .43. 
For the unrestricted population, the estimated 
correlation of ACT-C with FYGPA is .56. This may not 
overwhelmingly justify the use of ACT-C as the sole 
criterion for college admission. At the same time, 
however, it is important to acknowledge that the 
percentage of variance in FYGPA explained by the 
ACT-C did increase by 78% (from 1 8% to 32%). This 
increase is not trivial. 

Third, and as expected, the indicators of success 
in high school have higher correlations with the 
college success variable than does the admission 
test score. The differences between the correlation 
coefficients for these two types of predictor 
variables, .43 for ACT-C in contrast to .49 for NHSCPR 
and .56 for CCGPA, are fairly substantial. As noted 
earlier, this finding has been shared in several 
other studies using a wide variety of subjects and 
institutions (Hoffman, 2002; Munro, 1981; Zheng et 
al., 2002; among others). 

Fourth, the differences between the corrected 
and uncorrected correlations are greater for the 
high school success variables than for the admission 
test variable. In comparison to the restricted 
population, the unrestricted percentage of variance 
explained increases 138% for NHSCPR, 106% for 
CCGPA, and 78% for the ACT-C. It is not clear why 
the restrictions in range have a larger impact on the 



correlations of the high school success variables 
with FYGPA than the corresponding correlation 
involving ACT-C. Perhaps there is something about 
predicting college performance from high school 
performance that underlies the difference. The 
estimated correlations for the high school success 
predictors are significant and certainly reinforce 
the importance of these variables in college 
admission decisions. Furthermore, these findings 
provide a counter argument for those colleges and 
universities that have minimized the importance of 
or even dismissed these predictor tools in admission 
decisions. 

Further Research 

Clearly, additional research on the effect of 
restriction of range on the correlations of predictor 
variables with indicators of college success is 
needed. The single institution study cannot be 
considered to be definitive. Results for colleges or 
universities with different degrees of selectivity may 
differ. Further, it would be desirable if a situation 
could be found where the assumptions of the 
correction formula are met more closely than in this 
study. 

Conclusion 

Restriction of range is clearly one reason that 
correlations calculated from enrolled student 
populations understate the true relationships 
between predictor variables and college success 
measures. The values of such correlations should 
not be the single criteria for decisions concerning 
the use of the predictors in college admissions. 
Other variables may also depress these correlations. 
For example. Pike and Saupe (1 992) found that high 
school attended was a factor in the prediction of 
success in college. In that study, it was found that 
that when high school attended was controlled, the 
correlation increased. Unreliability in the predictor 
and in the college success measures also can 
depress the correlations (Sackett et al., 2008). 

In sum, the true relationships between 
predictor variables and college success measures 
can be masked by restricted range as well as 
other extraneous variables. The present study 



Page 8 



IR Applications, Number 31, Correcting Correlations When Predicting Success In College 



/HR 

Association for 
Institutional Research 



demonstrates the influence that restricted range 
can have on this relationship and suggests that 
these predictor variables are probably more 
accurate than what is generally shared in the 
literature and in practice. This study will have been 
successful if it stimulates others to explore the use 
of the correction formulas to estimate correlations 
between predictor variables and indicators of 
success in college for unselected populations. 



Editor's Note: 

The article by Saupe and Eimers is a delight. It 
might well be titled "Back to Basics" as it addresses 
one of the foundational issues in parametric 
statistics, the impact of restricting the range of 
variables. In classical test theory, also known as 
WeakTrue Score Theory because of its limited 
assumptions, the error variance is assumed to 
be consistent, so when the range of raw scores is 
restricted — which reduces the raw score variance — 
the true score variance is reduced. Hence, the scores 
become less reliable. The reduction of score range 
also reduces correlations, and this is the article's 
focus. The other part of the assumptions includes 
the linearity of the relationship that is the same for 
the included scores and the excluded segment. 

In cases where the restriction is at the extremes, 
like excluding the bottom 1 0% of a distribution or 
selecting the top 25% of a distribution, one might 
want to empirically test the assumption, where 
feasible. 

The first question the article raises, however, is 
about the use of test scores. It suggests the use of 
odds ratios, which are common for the independent 
variables in logistic regression but do not seem 
to be used that much as a dependent variable. 

This would seem to be an interesting alternative 
for some of our research, especially where the 
predicted result is a probability. 

The article presents a discussion of how ranges 
of measures are restricted using the admissions 
decision. It would seem that there are other 



situations, like selecting a subgroup to engage in an 
intervention based on some criterion or selecting 
a group of students to be surveyed based on some 
criterion. This also raises the issue of the relationship 
of the selection criterion to the outcome(s) of 
interest. If we do select groups based on criterion, 
should we look at the correlations between the 
criterion and the key variables in the study? 

There are a couple of points to make in this part 
of the discussion. First, while the earlier references 
include the classical derivations, the article by 
Sackett and Yang is, as the authors note, very good 
and includes a number of situations including cases 
when the restriction comes by deleting the center 
of the score distribution. They also show some of 
the restricted situations graphically. A second point 
is that the authors use a set of normalized scores 
for the high school rank scores. This is helpful since 
the standard deviation of the normalized scores 
is known by definition. It also is more appropriate 
since the percentile rank of high school scores is not 
an interval scale. 

The authors clearly demonstrate that the 
correlations between attenuated distributions 
can significantly understate the correlation that 
would be obtained if the full range of scores were 
used. It also seems to indicate that the increase is 
proportional to the correlation of the attenuated 
measure and the measure(s) in the correlation. The 
largest increase seems to occur when the restriction 
is explicit on one of the measures in the correlation. 
In their suggestion for further research, they do 
note the need to replicate their results. 

Another use of corrections, which would be a 
bit of an approximation, as was their case, would be 
to use the correction in support of a Meta analysis. 
For example, even though they mention that the 
ACT takers do not represent the population of high 
school graduates, it would still seem interesting 
to use a known ACT score deviation to correct 
reported correlations across studies based on the 
standard deviation of ACT in the individual study. 

As in all good research, Saupe and Eimers raise 
more questions from their work and, in the process, 
remind us of some of the basic work in our field. 
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